ESTIMATING AND UNDERSTANDING 
EXPONENTIAL RANDOM GRAPH MODELS 



SOURAV CHATTERJEE AND PERSI DIACONIS 

Abstract. We introduce a new method for estimating the parameters of exponential 
random graph models. The method is based on a large-deviations approximation to 
the normalizing constant shown to be consistent using theory developed by Chatterjee 
and Varadhan 15; . The theory explains a host of difficulties encountered by applied 
workers: many distinct models have essentially the same MLE, rendering the problems 
"practically" ill-posed. We give the first rigorous proofs of "degeneracy" observed in these 
models. Here, almost all graphs have essentially no edges or are essentially complete. We 
supplement recent work of Bhamidi, Bresler and Sly| [6] showing that for many models. 



the extra sufficient statistics are useless: most realizations look like the results of a simple 
Erdos-Renyi model. We also find classes of models where the limiting graphs differ from 
Erdos-Renyi graphs and begin to make the link to models where the natural parameters 
alternate in sign. 



1. Introduction 

Graph and network data are increasingly common and a host of statistical methods 
have emerged in recent years. Entry to this large literature may be had from the research 
papers and surveys in Fienberg \21\ I22j . One mainstay of the emerging theory are the 
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exponential families 

(1.1) pp{G) = exp ^^/3,r,(G) - ^(/3)^ 

where /3 = (/3i, . . . , /3fc) is a vector of real parameters, Ti, T2, . . . , Tfc are functions on 
the space of graphs (e.g., the number of edges, triangles, stars, cycles, . . . ), and ip is a 
normalizing constant. In this paper, Ti is usually taken to be the number of edges (or a 
constant multiple of it). 



We review the literature of these models in Section 2.1, Estimating the parameters 
in these models has proved to be a challenging task. First, the normalizing constant 
tp{/3) is unknown. Second, very different values of /3 can give rise to essentially the same 
distribution on graphs. 

Here is an example: consider the model on simple graphs with n vertices. 



(1.2) 



PPiMG) = exp ( 2PiE+^A - n^MP^h) 



where E, A denote the number of edges and triangles in the graph G. The normalization 
of the model ensures non-trivial large n limits. Without scaling, for large n, almost all 
graphs are empty or full. This model is studied by Strauss [52j, Park and Newman \Ab\ I46j . 
Haggstrom and Jonasson ^9j, and many others. 



Theorems 3.1 and 4.1 will show that for n large and non- negative /32, 



(1.3) 



V'n(/3l,/?2) 



0<«<1 



sup PlU + I32U 



1 



-nlogn 



il-u) logfl 



The maximizing value of the right-hand side is denoted u*(/3i, /32)- A plot of this function 



appears in Figure 1 Theorem 4.2 shows that for any j3i and /32 > 0, with high probability, 
a pick from P/3i,/32 is essentially the same as an Erdos-Renyi graph generated by including 
edges independently with probability u* {f3i, /32). This phenomenon has previously been 



identified by Bhamidi et al. [6] and is discussed further in Section 2.1 Figure 2 shows 
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Figure 1. The plot of u* against (/3i, /32). There is a discontinuity on the 
left where u* jumps from near to near 1; this corresponds to a phase 
transition. (Picture by Sukhada Fadnavis.) 



the contour lines for Figure 1 All the (/3i,/32) values on the same contour line lead to 
the same Erdos-Renyi model in the limit. Simulations show that the asymptotic results 
are valid for n as small as 30. Other methods for estimating normalizing constants are 
reviewed in [Section 2.2[ 

Our development uses the emerging tools of graph limits as developed by Lovasz and 



coworkers. We give an overview in Section 2.3, Briefly, a sequence of graphs Gn converges 
to a limit if the proportion of edges, triangles, and other small subgraphs in Gn converges. 
There is a limiting object and the space of all these limiting objects serves as a useful 
compactification of the set of all graphs. Our theory works for functions Ti[G) which are 
continuous in this topology. In their study of the large deviations of Erdos-Renyi random 



graphs, Chatterjee and Varadhan [15] derived the associated rate functions in the language 
of graph limit theory. Their work is crucial in the present development and is reviewed in 
ISection 2.4l 



Our main results are in Section 3 through Section 6 Working with general exponential 



models. Section 3 proves an extension of the approximation ( 1.3 ) for (Theorem 3.1 ) and 



SOURAV CHATTERJEE AND PERSI DIACONIS 



Contour map of u {^^.^2} Pi ^"'^ P2 ''^'^ 




Figure 2. Contour lines for Figure 1, All pairs (/3i,/32) on the same 



contour line correspond to the same value of u* and hence those models 
will correspond to the same Erdos-Renyi model in the limit. The phase 
transition region is seen in the upper left-hand corner where all contour 
lines converge. (Picture by Sukhada Fadnavis.) 



shows that, in the limit, almost all graphs from the model (1.1) are close to graphs where 
a certain functional is maximized. As will emerge, sometimes this maximum is taken on 



at a unique Erdos-Renyi model. Section 4 studies the problem for the model (1.1) when 
/32, • • . ,/3fc are positive may have any sign). It is shown that the large-deviations ap- 
proximation for ipn can be easily calculated as a one-dimensional maximization (Theorem 



4.1). Further, amplifying the results of Bhamidi et al.| [6], it is shown that in these cases, 



almost all realizations of the model (1.1) are close to an Erdos-Renyi graph (or perhaps 



a finite mixture of Erdos-Renyi graphs) (Theorem 4.2). These mixture cases actually 
occur for natural parameter values. This explains a further difficulty found by applied 
workers who attempt to estimate parameters by using Monte Carlo to match observed 



counts of small subgraphs. Section 5 also gives a careful account of the phase transitions 



and near-degeneracies observed in the edge-triangle model (1.3). 
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Sections 6, 7 and 8 investigate cases where Pi is allowed to be negative. While the general 



case remains open (and appears complicated), in Section 6 it is shown that Theorems 4.1 



and 4.2 hold as stated if {Pi)2<i<k ai'e sufficiently small in magnitude. This requires a 



careful study of associated Euler-Lagrange equations. Section 7 shows how the results 
change for the model containing edges and triangles when /32 is negative. For sufficiently 
large negative /32, typical realizations look like a random bipartite graph. This is very 
different from the Erdos-Renyi model. The result generalizes to other models via an 
interesting analogy with the Erdos-Stone theorem from extremal graph theory. Finally, 
in 



Section 8 we discuss a model that exhibits transitivity, an important requirement for 



social networks. 



2. Background 

This section gives needed background and notation in three areas. Exponential graph 



models (Section 2.1), graph limits (Section 2.3), and large deviations (Section 2.4). Some 



new material is presented as well, e.g., the analysis of Monte Carlo maximum likelihood 
in [Section 2.2[ 

2.1. Exponential random graphs. Let Qn be the space of all simple graphs on n labled 
vertices ("simple" means undirected, with no loops or multiple edges). Thus Qn contains 
2(2) elements. A variety of models in active use can be presented in exponential form 

/ k \ 



(2.1) 



P;,(G)=exp^ftr,(G)-^(/3) 



vi=l 



where /3 = (/3i, . . . , /3fc) is a vector of real parameters, Ti, T2, . . . , are real- valued func- 
tions on Qm and tp(^f3^ is a normalizing constant. Usually, Ti are taken to be counts of 
various subgraphs, e.g., Ti{G) = # edges in G, T2{G) = # triangles in G, .... The main 
results of Section 3| work for more general "continuous functions" on graph space, such as 
the degree sequence or the eigenvalues of the adjacency matrix. This allows models with 
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sufficient statistics of tlie form Yll=i Pidi{G) witli di{G) tlie degree of vertex i. See, e.g., 

inj. 



These exponential models were used by Holland and Leinhardt |32j in the directed case. 

developed them, showing that if Tj are chosen as edges, triangles. 



Frank and Strauss 



and stars of various sizes, the resulting random graph edges form a Markov random field. A 



general development is in Wasserman and Faust |54j . Newer developments are summarized 
in 



Snijders et al. [5T]. Finally, Rinaldo et al. |17] develop the geometric theory for this 



class of models with extensive further references. 

A major problem in this field is the evaluation of the constant tl^{l3) which is crucial for 
carrying out maximum likelihood and Bayesian inference. As far as we know, there is no 
feasible analytic method for approximating -0 when n is large. Physicists have tried the 

for the case where 



technique of mean-field approximations; see Park and Newman 



Ti is the number of edges and T2 is the number of two-stars or the number of triangles. 
Mean-field approximations have no rigorous foundation, however, and are known to be 
unreliable in related models such as spin glasses [53j. For exponential graph models. 



Chatterjee and Dey [13] prove that they work for some restricted ranges of {/3j}: values 



where the graphs are shown to be essentially Erdos-Renyi graphs (see Theorem |4.2| below 
and [6]). 

A host of techniques for approximating the normalizing constant using various Monte 



Carlo schemes have been proposed. As explained in Section 2.2[ these include the MCMLE 
procedure of Geyer and Thompson ^28] (see example below). The bridge sampling ap- 



proach of Gelman and Meng [27j also builds on techniques suggested by physicists to 



estimate free energy in our context). The equi-energy sampler of Kou et al. [36] can 

also be harnessed to estimate if). 

Alas, at present writing these procedures do not seem very useful. Snijders |50] and 
Handcock [31] demonstrate this empirically with further discussion in [51j. One theoretical 



explanation for the poor performance of these techniques comes from the work of Bhamidi 
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et al. [6]. Most of the algorithms above require a sample from the model (2.1 ). This is most 
often done by using a local Markov chain based on adding or deleting edges (via Metropolis 
or Glauber dynamics). These authors show that if the parameters are non- negative, then 
for large n, 

• either the pjs model is essentially the same as an Erdos-Renyi model (in which 
case the Markov chain mixes in n? log n steps) ; 

• or the Markov chain takes exponential time to mix. 

Thus, in cases where the model is not essentially trivial, the Markov chains required to 
carry MCMLE procedures cannot be usefully run to stationarity. 

Two other approaches to estimation are worth mentioning. The pseudo-likelihood ap- 
proach of Besag [5] is widely used because of its ease of implementation. Its proper- 
ties are at best poorly understood: it does not directly maximize the likelihood and 
in empirical comparisions (see, e.g., [IZ]), has appreciably larger variability than the 
MLE. Comets and Janzura [TB] prove consistency and asymptotic normality of the maxi- 
mum pseudo-likelihood estimator in certain Markov random field models. Chatterjee |12j 
shows that it is consistent for estimating the temperature parameter of the Sherrington- 
Kirkpatrick model of spin glasses. The second approach is Snijders' [50] suggestion to use 
the Robbins-Monroe optimization procedure to compute solutions to the moment equa- 
tions Ep[T{G)) = T{G*) where G* is the observed graph. While promising, the approach 
requires generating points from for arbitrary /3. The only way to do this at present is 
by MCMC and the results of suggest this may be impractical. 

Practical Remark. One use for the normalizing constant is to enable maximum likelihood 



estimates of the /3 parameter in the model (1.1). This requires evaluating ^/'(/3) on a fine 



grid in /3 space and then carrying out the maximization by classical methods (e.g., a grid 
search). Iterative refinement may be used when honing in at the maximum. The theory 
developed below allows for refining the estimate of V'(/5) along the following lines. Consider 



SOURAV CHATTERJEE AND PERSI DIACONIS 



the situation of Section 4 below where /32, . . . , /3fc are positive. Theorem 4.2 shows that the 
exponential model is close to an Erdos-Renyi graph with parameter u* determined by an 
equation similar to (1.3). Let q{G\f3) = exp(^*L]^ /3iTj(G)) be the unnormalized density. 
Generate independent, identically distributed random graphs Gi from the Erdos-Renyi 



model The estimator 



M ^ 



is unbiased for exp^/;(/3). Many similar variations can be concocted by combining present 



theory with the host of algorithms reviewed by Gelman and Meng |27| Sect. 3.4]. 



2.2. A simple example. In this section we treat the simplest exponential graph model, 
the Erdos-Renyi model. Here the relevant Markov chains for carrying out the Monte Carlo 



estimates of normalizing constants described at the end of Section 2.1 can be explicitly 
diagonalized and estimates for the variance of various estimators are available in closed 
form. The main findings are these: for graphs with n vertices, 

• the Metropolis algorithm for sampling from pj^ converges in order log n steps; 

• the variance of MCMLE estimates of the normalizing constant is exponential in 
n^, rendering them impractical. 

The model to be studied is 

(2.2) pp{G) = z(/3)-ie^^(^) 

for — oo < /3 < oo a fixed parameter, z{j3) the normalizing constant, and E[G) the number 
of edges in G. This is just the Erdos-Renyi model with edges included independently with 
parameter p = / {1 + e^). Here, the normalizing constant is 

z(/3) = (l + e^)(2) 

and /3 > corresponds to p > 1/2. We suppose throughout this section that /3 > 0. 
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A natural Markov chain for generating from is the Metropohs algorithm: 

• From G pick ^ < i < j < n, uniformly. 

• If is not in G, add this edge. 

(2.3) 

• If is in G, delete it with probability e ^ 
and leave it with probability 1 — . 

Call the transition matrix of this Markov chain K(G, G'). The following theorem gives an 
explicit spectral decomposition of K. It is useful to identify a graph G with the binary 
indicator of its edges, a vector xq £ with m = (2). 



Theorem 2.1. For the Metropolis Markov chain K of (2.3), with m 



\2l' 



(1) K is reversible with stationary distribution p 13(G) of (2.2). 



(2) For each ^ G G^^ there is an eigenvalue f3^ with eigenfunction tp^{x) given by 



m 



Here |^| is the number of ones in ^ and ^ ■ x is the usual inner product. The 
eigenf unctions are orthornormal in L?'{pj3). 
(3) The L^{pi3) or chi -square distance from stationarity, starting at G ■(r^ xq is 

2 ^ {K'{G,G')-p^{G')r ^,2, ,^2i 

G' J ^^Q 
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(4) For < (3 < 1, as n tends to infinity, £* = "^2(i+^~i^)^ steps are necessary and 
sufficient to drive XQ{i) to zero: 

lim = e''~^ - 1, lim xL(^*) = " 1; 



(1 + e- 



21* 



m 



m 



Proof. For (1), the Metropolis algorithm is reversible by construction |30j . For (2), the 
Metropolis chain is a product chain on the product space with component chain 

/„ : ^ 

with stationary distribution 

This two-state chain has (right) eigenfunctions/eigenvalues (orthonormal in L^(7r)) 

^o(O) = Vo(l) = 1, V'i(O) = e^/^ Vi(l) = -e-^/', 
/3o = 1, /3i = -e-". 

By elementary computations |19, Sect. 6], the product chain has eigenfunctions the prod- 
uct of these component eigenfunctions/values yielding (2). Formula (3) follows from ele- 
mentary spectral theory (see, e.g., [48] ) . For (4), starting from the empty graph G = 
corresponds to X0 = and then 
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Similarly, starting at the complete graph Kn -f-)- xk„ = (Ij • • • ) 1) and 

.7 = 1 ^ ^ 







m 



Now the stated results follow from elementary calculus (for upper bounds) and just using 
the first term in the sums above (for the lower bound). □ 

Note that the right-hand sides of the limits in (4) tend to zero as c tends to oo. Thus 



there is a cutoff in convergence at I*. More crudely, for the simple model (2.2), order 
m log m steps are necessary and sufficient for convergence for all values of /3 and all starting 
states. This remains true for total variation. More complicated models can have more 
complicated mixing behavior pj. The calculations for the Metropolis algorithm can be 
simply adapted for Glauber dynamics with very similar conclusions. 

In applications, Markov chains such as the Metropolis algorithm are used to estimate 
normalizing constants or their ratios. Consider an exponential graph model p/j (as in 



(2.1)) on Qn with normalizing constant z{l3). Several estimates of z{(3) are discussed in 



[Section 2.1[ These include: 



Importance sampling. Generate Gi, G2, . . . , Gn from a Markov chain with known station- 
ary distribution Q{G) and use 

This is an unbiased estimate of -z(/3). This requires knowing Q. (For example, an Erdos- 
Renyi model may be used.) If Q is only known up to a normalizing constant, say Q = zQ, 
then 

Ef=ii/0(G,) 

may be used. 
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MCMLE. Generate Gi, G2, . . . , Gat with stationary distribution ppo and use 



1 ^ 

(2.5) ZM = ^ E^^P {(/^^ - (^')T^i.Gj)} . 

i=i 

This is an unbiased estimate of z{(3)/ z{/3^). 

Acceptance ratio. Generate Gi, . . . ,Gni with stationary distribution p^o and G[, . . . , G'^^ 
with stationary distribution pj^ and use 

TVrEfiiexplEiftT^ilC.OjaCG,) 

(2.6) ZA- 



iEfiiexp{E,/3°r(G;)}a(G,)' 

Here a can be any function on graph space. The numerator is an unbiased estimator of 
c/ z{j3^). The denominator is an unbiased estimator of c/ z{l3) with c = EceGn ^-^P{X]i=i('^«~'" 
(3^)Ti{G)}a{G). Thus the ratio estimates 2;(/3)/z(/3o). Common choices of a{G) are the 
constant function, or a (G) = exp{i E(A - l^^i)Ti{G)}. See [27j for history and efforts to 
optimize a. 

All of these estimators involve things like E^(/(G)) with /(G) an exponentially large 
function. In the remainder of this section we investigate the variance of these estimates 
in the Erdos-Renyi case. To ease notation, suppose that all Markov chains start in sta- 
tionarity. Let K{G, G') be a reversible Markov chain on On with stationary distribution 
P(G). Suppose that K has eigenvalues and eigenfunctions ip^^ for ^ G G^. Let / be a 
function on Qn- Expand /(G) = E^ /(0''/'c(^g)) with 

/(e) = E/(^)^«(^G)P/3(G'). 

G 

Let Gi, G2, . . . , Gat be a stationary realization from K. Proposition 2.1 in shows that 
the estimator jj- = jf E£i fi^i) is unbiased with variance 

(2.7) var(/i) = -^E|/(e)|Vjv(e) 
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where 



^-(0 = ■ 



For large A'^, the asymptotic variance is 



(2. 



alifL) := lim iVvar(/i) = /(O 
TV— >-oo ^ — ^ I 

2 



1-/3^ 



< 



l-/?i 

Here /3i is the second eigenvalue and 



2,0 ^oo 



2 

2,0 



?7^o Geg„ GeGn 



For the Erdos-Renyi model (2.2) with the Markov chain (2.3), all the quantities needed 



above are available in closed form: 



Lemma 2.2. With notation as in Theorem 2.1. let /(G) = e''^^'^). Th 



len 



/(^) = (i_e'*)l€l(l + e"+^)™-l«l. 



As an example, we compute the usual bound for the asymptotic variance of the MCMLE 



estimate (2.5). More precise calculations based on (2.7) do not change the basic message; 



the standard deviation is exponentially larger than the mean. 



Proposition 2.3. For P > and /3o > m the Erdos-Renyi model (2.2), the MCMLE 



estimate for the ratio of normalizing constants (2.5) is unbiased with mean 



1^ 



1 + 



The second eigenvalue is /3i = 1 — (1 + e ^)/m. The variance hound is 
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It follows that, if 7^ 13, d''^/iJ? tends to 00 exponentially fast as n tends to infinity. 



For example if /3o = 2 and /3 = 1 then = (2.2562)"^ and a^Z/i^ = ^itgg [(1-042)™ - 1]. 
If n = 30, (J00//U = 95, 431. If n = 100, the ratio is huge. 



2.3. Graph limits. In a sequence of papers l9l[ini[IIl[25l[371[Ml[MlllQlllIlll21ll3], 

Laszlo Lovasz and coauthors V.T. Sos, B. Szegedy, C. Borgs, J. Chayes, K. Vesztergombi, 
A. Schrijver, and M. Freedman have developed a beautiful, unifying theory of graph limits. 
(See also the related work of Austin [2j and Diaconis and Janson ^18j which traces this 
back to work of Aldous [Ij, Hoover [33J and Kallenberg p5].) This sheds light on topics 
such as graph homomorphisms, Szemeredi's regularity lemma, quasi-random graphs, graph 
testing and extremal graph theory, and has even found applications in statistics and related 
areas (see e.g., [H]). Their theory has been developed for dense graphs (number of edges 
comparable to the square of number of vertices) but parallel theories for sparse graphs are 
beginning to emerge [7]. 

Lovasz and coauthors define the limit of a sequence of dense graphs as follows. We 
quote the definition verbatim from [30] (see also [TOl [TTl [18] ) . Let G„ be a sequence of 
simple graphs whose number of nodes tends to infinity. For every fixed simple graph H, let 
I hom{H, G) \ denote the number of homomorphisms H into G (i.e., edge-preserving maps 
V{H) —7- V{G), where V{H) and V{G) are the vertex sets). This number is normalized 
to get the homomorphism density 

(2-9) t{H,G) .- |^(^)||v(H)r 

This gives the probability that a random mapping V{H) — )• V{G) is a homomorphism. 

Note that |hom(i?, G)| is not the count of the number of copies of H in G, but is 
a constant multiple of that if is a complete graph. For example, \i H \s a, triangle, 
I hom(ff, G)\ is the number of triangles in G multiplied by six. On the other hand if H is, 
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say, a 2-star (i.e. a triangle with one edge missing) and G is a triangle, then the number 
of copies of H in G is zero, while | hom(i7, G)\ = 3'^ = 27. 

Suppose that the graphs On become more and more similar in the sense that t{H, Gn) 
tends to a limit t{H) for every H. One way to define a limit of the sequence {(?„} is to 
define an appropriate limit object from which the values t[H) can be read off. 

The main result of [ID] (following the earlier equivalent work of Aldous [Ij and Hoover 
|33j ) is that indeed there is a natural "limit object" in the form of a function h G W, where 
W is the space of all measurable functions from [0,1]^ into [0,1] that satisfy h{x,y) = 
h{y, x) for all x, y. 

Conversely, every such function arises as the limit of an appropriate graph sequence. 
This limit object determines all the limits of subgraph densities: if is a simple graph 



Here E{H) denotes the edge set of H. A sequence of graphs {Gn}n>i is said to converge 
to h if for every finite simple graph H, 



Intuitively, the interval [0, 1] represents a 'continuum' of vertices, and h(x, y) denotes the 
probability of putting an edge between x and y. For example, for the Erdos-Renyi graph 
G{n,p), if p is fixed and n — )■ cxd, then the limit graph is represented by the function that 
is identically equal to p on [0, 1]^. 

These limit objects, i.e., elements of W, are called "graph limits" or "graphons" in 
|10lllllH0] . A finite simple graph G on {1, . . . , n} can also be represented as a graph limit 



with V{H) = [k] = {1,...,A;}, let 



(2.10) 




(2.11) 



lim t{H,Gn) =t{H,h). 
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f'-' is a natural way, by defining 
(2.12) f{x,y)-- 

other 

v 

The definition makes sense because t(H, f'^) = t(H, G) for every simple graph H and 



1 if ( [nx] , [ny] ) is an edge in G 
otherwise. 



therefore the constant sequence {G, G, . . .} converges to the graph limit f'^. Note that 
this allows all simple graphs, irrespective of the number of vertices, to be represented as 
elements of a single abstract space, namely W. 

With the above representation, it turns out that the notion of convergence in terms of 
subgraph densities outlined above can be captured by an explicit metric on W, the so- 
called cut distance (originally defined for finite graphs by Frieze and Kannan |26j). Start 
with the space W of measurable functions f{x,y) on [0, 1]^ that satisfy < f{x,y) < 1 
and f{x,y) = f{y,x). Define the cut distance 



(2.13) dn{f,g):= sup 

5,TC[0,1] 



[f{x,y) - gix,y)] dxdy 

SxT 



Introduce in W an equivalence relation: Let S be the space of measure preserving bijections 
cr : [0,1] [0,1]. Say that f{x,y) ~ g{x,y) if f{x,y) = ga{x,y) := g{ax,ay) for some 
(7 G S. Denote by 5 the closure in (W, dn) of the orbit {go}- The quotient space is denoted 
by W and r denotes the natural map g ^g. Since dn is invariant under a one can define 
on W, the natural distance 6n by 

Sn{f,g) ■■= ini dn{f,g^) = 'midn{fa,g) = inf dn{f^^,g^^) 

cr cr cri,cr2 

making (W,(5n) into a metric space. To any finite graph G, we associate f'^ as in (2.12) 



and its orbit G = r/"^ = eW. 

The papers by Lovasz and coauthors establish many important properties of the metric 
space W and the associated notion of graph limits. For example, W is compact. A pressing 
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objective is to understand what functions from W into M are continuous. Fortunately, it 
is an easy fact that the homomorphism density t{H, •) is continuous for any finite simple 
graph H \10\ lllj. There are other, more complicated functions that are continuous (see, 
e.g., [3]). 

2.4. Large deviations for random graphs. Let G{n,p) be the random graph on n 
vertices where each edge is added independently with probability p. This model has been 
the subject of extensive investigations since the pioneering work of Erdos and Renyi |20] . 
yielding a large body of literature (see [8l[3l] for partial surveys). 

Recently, Chatterjee and Varadhan [T3] formulated a large deviation principle for the 
Erdos-Renyi graph, in the same way as Sanov's theorem |49j gives a large deviation 
principle for an i.i.d. sample. The formulation and proof of this result makes extensive 



use of the properties of the topology described in Section 2.3 
Let /p : [0, 1] — 7- M be the function 

(2.14) Ip(n) : = -n log ^ + -(1 -n) log 



2 p 2' ' l-p 
The domain of the function Ip can be extended to W as 

(2.15) Ip{h):= [ [ Ip{h{x,y))dxdy. 

Jo Jo 

The function Ip can be defined on W by declaring Ip{h) := Ip{h) where h is any represen- 
tative element of the equivalence class h. Of course, this raises the question whether Ip is 
well defined on W. It was proved in [15] that the function Ip is indeed well defined on W 
and is lower semicontinuous under the cut metric 5u- 

The random graph G{n,p) induces probability distributions Pn,p on the space W through 
the map G — )• f'^ and P„^p on W through the map G — )• — )• f'-' = G. The large deviation 
principle for Fn,p on (W, 6n) is the main result of [15]. 
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Theorem 2.4 (Chatterjee and Varadhan p3^|)- For each fixed p G (0, 1), the sequence Pn,p 
obeys a large deviation principle in the space W (equipped with the cut metric) with rate 



function !„ defined by (2.15). Explicitly, this means that for any closed set F C W, 



(2.16) limsup \ logP„,p(F) < - mf Ip(/i). 

n— >oo n h&F 

and for any open set U C W, 

(2.17) liminf ^logP„„(^7) > - inf Uh). 

3. Exponential random graphs 

Let T : W — )• M be a bounded continuous function on the metric space (W,5n)- Fix n 
and let Qn denote the set of simple graphs on n vertices. Then T induces a probability 
mass function p„ on Qn defined as: 



PniG) := e 



n2(T(G)-Vn) 



Here G is the image of G in the quotient space W as defined in Section 2.2 and ipn is a 
constant such that the total mass of pn is 1. Explicitly, 

(3.1) ^„ = i,log5]e"^^(5) 

GeGn 

The coefficient is meant to ensure that ipn tends to a non-trivial limit as n — )• oo. To 
describe this limit, define a function / : [0, 1] — )• M as 

I{u) := ^Mlogn+ ^(1 -M)log(l -u) 

and extend / to W in the usual manner: 

(3.2) I{h) = \[[ I{h{x,y))dxdy 
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where h is a representative element of the equivalence class h. As mentioned before, it 
follows from a result of |15] that / is well defined and lower semi-continuous on W. The 
following theorem is the first main result of this paper. 

Theorem 3.1. IfT-.W^Misa bounded continuous function and ipn o-nd I are defined 
as above, then 

ip := lim ipn = sup (T(/i) — I{h)). 

Proof. For each Borel set j4 C W and each n, define 

An := {h G A -.h = G ior some G £ Qn}- 



Let P„^p be the Erdos-Renyi measure defined in Section 3 Note that An is a finite set 
and 

\An\ = 2"(-l)/¥„,i/2(I„) = 2"("-l)/¥„,i/2(I). 

Thus, if F is a closed subset of W then by Theorem 



2.4 



log|F„| log 2 ~ 
limsup 2 — - ~T, l^^Lh/2W 



(3.3) =-mfI(/i). 
Similarly if U is an open subset of W, 

(3.4) liminf > - mf l(h). 

Fix e > 0. Since T is a bounded function, there is a finite set R such that the intervals 
{{a, a + e) : a G i?} cover the range of T. For each a £ R, let F"" := T~^{[a,a + e]). By 
the continuity of T, each F'^ is closed. Now, 

^n^^r. < ^gn2(a+.)|^a| < gup e"'('^+^) . 



20 



SOURAV CHATTERJEE AND PERSI DIACONIS 



By (3.3), this shows that 



hm sup ipn < sup (a + e — jnf 1(h)) . 



Each h £ F'^ satisfies T{h) > a. Consequently, 



_sup (T(/i) - 1(h)) > sup (a - 1(h)) = a- jnf 1(h). 

h^pa h&pa h£F^ 



Substituting this in the earher display gives 



(3.5) 



lim sup -i/;?! < e + sup sup (T(h) — 1(h)) 
= e+sup^(T(h)-I(h)). 



For each a G R, let U"' := T ^((a, a + e)). By the continuity of T, U"' is an open set. Note 
that 



> supe" 

aG-R 



Therefore by (3.4), for each a £ R 



liminf^„ > a — _inf 1(h). 



Each h ^ If^ satisfies T(h) < a + e. Therefore, 



_sup (T(h) - 1(h)) < sup (a + e- 1(h)) = a + e- jnf 1(h). 
Together with the previous display, this shows that 



(3.6) 



liminf V'n > — e + sup sup (T(h) — 1(h)) 

= -e+ sup^(T(h) - 1(h)). 
hew 



Since e is arbitrary in (3.5) and (3.6), this completes the proof. 



□ 
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Theorem 3.1 gives an asymptotic formula for ^pJ-l. However, it says nothing about the 
behavior of a random graph drawn from the exponential random graph model. Some 
aspects of this behavior can be described as follows. Let F* be the subset of W where 
T{h) — I{h) is maximized. By the compactness of W, the continuity of T and the lower 
semi-continuity of /, F* is a non-empty compact set. Let G„ be a random graph on n 
vertices drawn from the exponential random graph model defined by T. The following 
theorem shows that for n large, Gn must lie close to F* with high probability. In particular, 
if F* is a singleton set, then the theorem gives a weak law of large numbers for Gn- 



Theorem 3.2. Let F* and Gn be defined as the above paragraph. Then for any r] > 
there exist G,5 > such that for all n, 

Proof. Take any r/ > 0. Let 

A:= {h:5n(h,F*)>rj}. 

It is easy to see that A is a closed set. By compactness of W and F* , and upper semi- 
continuity of r — /, it follows that 

26 := su£(r(/i) - I{h)) - sup{T(h) - l(h)) > 0. 
hew heA 



Choose e = 6 and define F"" and R as in the proof of Theorem 
Then 



3.1 



Let A" := An F". 



lP(Gn G ^) < e""'^" ^e"'("+^)K| < e-^'^^-lRl supe"'("+^)K|. 



aeR '^eR 



While bounding the last term above, it can be assumed without loss of generality that 
A"" is non-empty for each a €z R, for the other o's can be dropped without upsetting the 



22 



SOURAV CHATTERJEE AND PERSI DIACONIS 



bound. By (3.3) and Theorem 3.1 (noting that A'^ is compact), the above display gives 
logP(G„ gI) 



hm sup 



< sup(a + e - Jnf I{h)) - sup (r(/i) - I{h)). 

aeR heA'^ /jgyy 



Each h e A"- satisfies T{h) > a. Consequently, 



_sup {T{h) - I{h)) > sup (o - I{h)) = a - Jnf I{h). 

h^^a /jg^a h&A'^ 



Substituting this in the earlier display gives 
logP(G„ eI) 



lim sup 



n 



2 < e + sup_sup (T(/i) - 1(h)) - su£(r(/i) - 1(h)) 

°-^^heA'^ h&V 

e + snv(T(h) - 1(h)) - snp(T(h) - 1(h)). 

heA h&W 

€-25 = -6. 



This completes the proof. 



□ 



4. An Application 

Let Hi, ... , Hk be finite simple graphs, where Hi is the complete graph on two vertices 
(i.e. just a single edge), and each Hi contains at least one edge. Let . . . ,/3fc be k real 
numbers. For any /i G W, let 

fc 

(4.1) T(h):=Y,Pit{Hi,h) 



where t(Hi,h) is the homomorphism density of Hi in h, defined in (2.10). Note that 
there is nothing special about taking Hi to be a single edge; if we do not want Hi in our 
sufficient statistic, we just take (3i = 0; all theorems would remain valid. 



As remarked in Section 2.3, T is continuous with respect to the cut distance on W, and 



hence admits a natural definition on W. Note that for any finite simple graph G that has 
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at least as many nodes as the largest of the Hi^s, 



r(G) = ^M^.,G). 



i=l 

For example, if A; = 2, and H2 is a triangle, and G has at least 3 nodes, then 

rrfn\ - 2/5i(#edges in G) 6/32(#triangles in G) 
^ y^) - Z9. ^ • 



Let ipn be as in (3.1 ), and let Gn be the n- vertex exponential random graph with sufficient 



statistic T. Theorem 3.1 gives a formula for lim„_!.oo V'n as the solution of a variational 
problem. Surprisingly the variational problem is explicitly solvable if /32, . . . ,/5fc are non- 
negative. 



Theorem 4.1. Let T, ipn o.nd Hi,...,Hk be as above. Suppose (32,--.,/3k are non- 
negative. Then 

/ k \ 



(4.2) 



lim = sup j3i 

0<«<i 



where I{u) = inlogn + i(l — n)log(l — u) and e{Hi) is the number of edges in Hi 



Moreover, each solution of the variational problem of Theorem 3.1 for this T is a constant 



function, where the constant solves the scalar maximization problem (4.2). 



Proof. By Theorem 3.1 



(4.3) 

By Holder's inequality, 



lim ijjn = sup(T(/i) — I{h)). 



t{Hi,h)<ff h{x,yy^"'Uxdy. 



24 SOURAV CHATTERJEE AND PERSI DIACONIS 

Thus, by the non- negativity of /32, . . . , 



k 

T{h)<^it{Hi,h) + y^Pi ff /i(x,y)^(^') 



dxdy 



1=2 

k 



On the other hand, the inequahty in the above display becomes an equahty if /i is a 
constant function. Therefore, if n is a point in [0, 1] that maximizes 

k 



i=l 



then the constant function h{x,y) = u solves the variational problem (4.3). To see that 
constant functions are the only solutions, assume that there is at least one i such that the 
graph Hi has at least one vertex with two or more neighbors. The above steps show that 
if /i is a maximizer, then for each i, 

(4.4) t{Hi,h)= [[ h{x,yy^"'Uxdy. 

In other words, equality holds in Holder's inequality. By the assumed condition and the 
criterion for equality in Holder's inequality, it follows that h{x, y) = h{y, z) for almost 
every {x,y,z). From this one can easily conclude that h is almost everywhere a constant 
function. 

If the condition does not hold, then each Hi is a union of vertex-disjoint edges. Assume 



that some Hi has more than one edge. Then again by (4.4) it follows that h must be a 
constant function. 



Finally, if each Hi is just a single edge, then the maximization problem (4.3) can be 



explicitly solved and the solutions are all constant functions. □ 
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Theorem 4.1 gives the hmiting value of tpn if 1^2, ■ ■ ■ ^ Pk are non-negative. The next 
theorem describes the behavior of the exponential random graph Gn under this condition 
if n is large. 

Theorem 4.2. For each n, let Gn be an n-vertex exponential random graph with sufficient 



statistic T defined in (4.1). Assume that /32, . . . ,/3/c are non-negative. Then: 



(a) If the maximization problem in (4.2) is solved at a unique value u* , then Gn is in- 
distinguishable from the Erdos-Renyi graph G{n, u*) in the large n limit, in the sense 
that Gn converges to the constant function u* in probability as n ^ oo. 

(b) Even if the maximizer is not unique, the set U of maximizers is a finite subset of [0, 1] 
and 

mm5n{Gn,u) — t- in probability as n ^ oo 

where u denotes the image of the constant function u inW. In other words, Gn behaves 
like an Erdos-Renyi graph G{n, u) where u is picked randomly from some probability 
distribution on U . 

Proof. The assertions about graph limits in this theorem are direct consequences of Theo- 
Since X^^L^ fiiu'^^^'-^ is a polynomial function of u and I{u) is sufficiently 



rems 



3.2 



and 



4.1 



well-behaved, showing that C/ is a finite set is a simple analytical exercise. □ 



It may be noted here that the conclusion of Theorem 4.2 was proved earlier by Bhamidi 



et al. [6] under certain restrictions on the parameters that they called a 'high temperature 
condition'. An important observation from [6] is that when /32, ■ ■ ■ , (3k are non-negative, 
the model satisfies the so-called FKG property l23j. The FKG property has important 
consequences; for instance, it implies that the expected value of t{Hi,G) is an increasing 
function of Pj for any i and j . We will see some further consequences of the FKG property 



in our proof of Theorem 5.1 in the next section. 
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5. Phase Transitions and Near-Degeneracy 

To illustrate the results of the previous section, recall the exponential random graph 



model (1.2) with edges and triangles as sufficient statistics: 

#edges in G ^triangles in G 



(5.1) 



T{G) = 2/3i 



+ 6/32 



Let Gn be an n-vertex exponential random graph with sufficient statistic T. By Theorem 
the probability mass function for this model can be approximated by Pi3i,i32{G) = 



3.1 



exjp{n?T |3-^^^l3r,{G)) with 



T/3i, /32(G) := inf T iB^^i^^^aiu), 

U<M<1 



where 



-r / \ o« #edges in G #triangles in G 



o 1 1 

- /3in - /32n^ + -nlogu + -(1 - u) log(l - u). 

The figures below have n = 30 and graphs are sampled from Pi3i,i32 using Glauber dynamics 
run for 10,000 steps. Figure 3 and Figure 4 show contour plots of T^j^^jlG*) as /3i and /32 



vary, fixing a realization of G. Figure 5 and Figure 6 illustrate the behavior of T^^^^2,g(^) 
as u varies. The captions explain the details. 
Now fix /?i and /32 and let 



(5.2) 



(u) := /3iu + /32U^ - /(n) 



where I{u) = ^ulogu + ^(1 — 'u)log(l — u), as usual. Let U be the set of maximizers 



of £{u) in [0, 1]. Theorem 4.2 describes the limiting behavior of G„ in terms of the set 



U. In particular, if U consists of a single point u* = u*(/3i,/?2), then Gn behaves like the 
Erdos-Renyi graph G{n,u*) when n is large. 
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Figure 3. The contour plot of T^j^^2(G). Here G is chosen from the 
distribution given by /3i = —0.45, (52 = 0.2. Given sample G it is most 
likely to be chosen from distributions given by parameters not too far from 
the original parameters /3i , /32 ; this indicates that our approximation for 
is good even when n = 30. (Picture by Sukhada Fadnavis.) 



It is likely that n*(/3i,/32) does not have a closed form expression, other than when 
/32 = 0, in which case 

n*(/3i,0) 



1 + e2/3i 



It is, however, quite easy to numerically approximate (32)- [Figure 7 plots /32) 

versus /32 for four different fixed values of namely, /3i = 0.2,-0.35,-0.45, and —0.8. 
The figures show that u* is a continuous function of /32 as long as /3i is not too far down 
the negative axis. 

But for /3i below a threshold (e.g., when /3i = —0.45), u* shows a single jump discon- 
tinuity in /32, signifying a phase transition. In physical terms, this is a first order phase 



transition, by the following logic. By Theorem 4.2 , our random graph behaves like G{n, u*) 
when n is large. On the other hand, by a standard computation the expect number of 
triangles is the first derivative of the free energy ipn with respect to /32. Therefore in 
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Contour map ofT^ p I'G) as and ^2^'^^^' 




-1 -0.6 6 1 



Figure 4. The contour plot of ^i3i,/32{G). Here G is chosen from the 
distribution given by /3i = 0.4, /32 = 0.2. Again, given sample G it is most 
likely to be chosen from distributions given by parameters not too far from 
the original parameters /3i,/32. (Picture by Sukhada Fadnavis.) 



the large n limit, a discontinuity in u* as a function of (32 signifies a discontinuity in the 
derivative of the limiting free energy, which is the physical definition of a first order phase 
transition. 

At the point of discontinuity, i{u) is maximized at two values of u, i.e., the set U 
consists of two points. Lastly, as /3i goes down the negative axis, the model starts to 
exhibit "near-degeneracy" in the sense of Handcock [31J (see also [45j) as seen in the last 
frame of [Figure"? This means that as /32 varies, the model transitions from being a very 
sparse graph for low values of /32 , to a very dense (nearly complete) graph for large values 
of /32, completely skipping all intermediate structures. 

The following theorem gives a simple mathematical description of this phenomenon and 
hence the first rigorous proof of the degeneracy observed in exponential graph models. 
Related results are in Haggstrom and Jonasson [29]. 
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Theorem 5.1. Let Gn be an exponential random graph with sufficient statistic T defined 



in (5.1). Fix any /3i < 0. Let 



e^i 1 

Suppose \f3i\ is so large that ci < C2- Let e{Gn) be the number of edges in Gn and let 
f{Gn) ■= e{Gn)/{2) be the edge density. Then there exists q = q{fii) G [0,oo) such that if 
— oo < f32 < q, then 

lim F{f{Gn) > ci) = 0, 

n— >oo 

and if (32 > q, then 

lim P(/(G„) < C2) = 0. 

n— >oo 

In other words, if Pi is a large negative number, then Gn is either sparse (if (32 < q) or 
nearly complete (if (32 > q)- 

Remark. The difference in the values of ci and C2 can be quite striking even for relatively 
small values of (3i. For example, (3i = —5 gives ci ~ 0.007 and C2 = 0.9. 

Proof. Fix /?! < such that ci < C2. As a preliminary step, let us prove that for any 
/32 >0, 

(5.3) lim P(/(G„) G (ci,C2)) =0. 

n— >oo 

Fix (32 > 0. Let u be any maximizer of L Then by Theorem |4.2[ it suffices to prove that 
either u < e^i/(l + e^^) or n > 1 + l/2/3i. This is proved as follows. Define a function 
£/ : [0, 1] ^ M as 

Then £ is maximized at u if and only if g is maximized at u^. Since £ is a bounded 
continuous function and ^'(0) = oo, = — oo, i cannot be maximized at or 1. 

Therefore the same is true for g. Let f be a point in (0, 1) at which g is maximized. Then 
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Figure 5. The plot of — T/Jj^/Jj.gC'u) versus u when /3i is fixed at —0.1. For 
this choice of (3i there is no phase transition and T has a unique maximum 
always. (Picture by Sukhada Fadnavis.) 



g"{v) <0. A simple computation shows that 

1 / w^/^ 
g"{v) = ^ -2Pi + log 



Thus, g"{v) < only if 

This shows that a maximizer u of £ must satisfy u < ci or u > C2. Now, if u = ci, 
then u < C2, and therefore the above computations show that g"{v) > 0, where v = u^. 
Similarly, \i u = C2 then u > ci and again g"{v) > 0. Thus, we have proved that u < ci 



or n > C2- By Theorem 3.2, this completes the proof of (5.3) when /32 > 0. 

Now notice that as {32 — ^ oo, sup^<^f(n) ~ /32a'^ for any fixed a < 1. This shows that 
as P2 — ^ 00, any maximizer of i must eventually be larger than 1 + l/2/3i. Therefore, for 
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sufficiently large /32, 

(5.4) lim P(/(G„) < C2) = 0. 

n— ^-oo 

Next consider the case (32 < 0. Let F* be the set of maximizers of T{h) — I{h). Take 
any h ^ F* and let h he a representative element of h. Let p = e^'^^/(l + e^^^). An easy 
verification shows that 



T{h) - I{h) = f32t{H2, h) - Ip{h) - - log(l - p), 



where Ip{h) is defined as in (2.15). Define a new function 



hi{x,y) := mm{h{x,y), p}. 



Since the function Ip defined in (2.14) is minimized at p, it follows that for all x,y G 
[0,1], Ip{hi{x,y)) < Ip{h{x,y)). Consequently, Ip{hi) < Ip{h). Again, since /32 < and 
hi < h everywhere, l32t{H2,hi) > j32t{H2,h). Combining these observations, we see that 
T{hi) — I{hi) > T[h) — I{h). Since h maximizes T — / it follows that equality must hold 
at every step in the above deductions, from which it is easy to conclude that h = hi a.e. 
In other words, h(x,y) < p a.e. This is true for every h €z F* . Since p < ci, the above 



deduction coupled with Theorem 3^ proves that when /32 < 0, 
(5.5) lim F{f{Gn) > ci) = 0. 

n— >oo 

Recalling that /3i is fixed, define 



a„(/32) := F{f{Gn) > ci), b^) := P(/(G„) < C2). 
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Figure 6. The plot of — T/^j^/Jj.gC'u) versus u when /3i is fixed at —0.8. For 
this choice of f3i there is a phase transition and — T has two local maxima 
always. The left one starts as the global maxima; they become equal at 
phase transition, then the right maxima becomes the global maximum. 
This is the jump in the value of u* observed in Figure l| (Picture by 
Sukhada Fadnavis.) 



Let An and i?„ denote the events in brackets in the above display. A simple computation 
shows that 

a;(/32) = -Cov(U„,A(G„)) and ^^(/Sa) = -Cov(1b„, A(G„)), 
n n 

where A((j'„) is the number of triangles in G„. It is easy to see that the exponential 
random graph model with /32 > satisfies the FKG criterion [23]. Therefore the above 
identities show that on the non-negative axis, (Xji is a non-decreasing function and is a 
non-increasing function. 



Let qi := sup{x G M : lim„_j.oo an{x) = 0}. By equation (5.4), qi < oo and by equation 



(5.5) qi > 0. Similarly, if q2 := inf{a; G M : lim„^oo bn{x) = 0}, then < ^2 < co- Also, 
clearly, qi < q2 since an + &n ^ 1 everywhere. We claim that qi = q2- This would complete 
the proof by the monotonicity of a„, and 6„. 
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0.0 0.5 1.0 1.5 



0.0 0.5 1.0 1.5 



(a) /3i = 0.2 



(b) /3i = -0.35 



0.0 0.5 1.0 1.5 



0.5 1 .0 1 .5 



(c) /3i = -0.45 



(d) /3i = -01 



Figure 7. Plot of u*(/3i,/32) on y-axis vs /32 on x-axis for different fixed 
values of f3i. Part (c) demonstrates a phase transition. Part (d) demon- 
strates near-degeneracy. 



To prove that qi = q2, suppose not. Then qi < q2. Then for any /32 G (91,92), 
limsupa„(/32) > and limsup 6„(/32) > 0. Now, 

< an{h) + hum - 1 = nf{Gn) G (Cl, C2)). 



Therefore by (5.3) 



lim (a„(/32) + 6n(/32)-l) = 0. 
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Thus, for any /32 G {qi, q2), limsup(l — bn{P2)) > 0. By Theorem 4.2 this imphes that the 
function i has a maximum in [c2, 1]. Similarly, for any [32 G {qi, 92)) limsup(l — a„(/32)) > 
and therefore the function I has a maximum in [0, ci]. Now fix qi < (^2 < (^2 < q2, and let 
£ and i denote the two ^-functions corresponding to /32 and P2 respectively. That is, 

£{u) = fiiu + hu^ - I{u), l{u) = fiiu + hu^ - I{u). 

By the above argument, (. attains its maximum at some point ui G [0,ci] and at some 
point U2 G [c2, 1]. (There may be other maxima, but that is irrelevant for us.) Note that 

maxZ(M) = max(£(u) + (/32 - /32)u^) < l{ui) + 02 - I32)c\. 

U<Cl U<Cl 

On the other hand 

max^~(?i) > i{u2) = 1{U2) + [h - P2)ul > i{u2) + 02 - ^2)4. 

U>C2 

Since £{ui) = £{u2), P2 > P2 and C2 > ci, this shows that 

max^(u) < max^(ti), 

contradicting our previous deduction that £ has maxima in both [0,ci] and [c2, 1]. This 
proves that qi = q2- D 



6. The Symmetric Phase, Symmetry Breaking, and the Euler-Lagrange 

Equations 

Borrowing terminology from spin glasses, we define the replica symmetric phase or 
simply the symmetric phase of a variational problem like maximizing T{h) — I{h) as 
the set of parameter values for which all the maximizers are constant functions. When 
the parameters are such that all maximizers are non-constant functions we say that the 
parameter vector is in the region of broken replica symmetry, or simply broken symmetry. 
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There may be another situation, where some optimizers are constant while others are non- 
constant, although we do not know of such examples. (This third region may be called a 
region of partial symmetry.) 

Statistically, the exponential random graph behaves like an Erdos-Renyi graph in the 
symmetric region of the parameter space, while such behavior breaks down in the region 



of broken symmetry. This follows easily from Theorem 3.2 



Theorem 4.2 shows that for the sufficient statistic T defined in (4.1 ), each /32, • • . , 
in MxR^^"*^ falls in the replica symmetric region. Does symmetry hold only when /32, . . . , /3fc 
are non-negative? The following theorem (proven with the aid of the Euler-Lagrange 



equations of Theorem 6.3 below), shows that this is not the case; (/3i, . . . , Pk) is in the 



replica symmetric region whenever |/32|, . . • , |/3fc| are small enough. Of course, this does not 



supersede Theorem 4.2 since it does not cover large positive values of /32, • • • , Pk- However, 
it proves replica symmetry for small negative values of /32 , . . . , , which is not covered by 
Theorem l42l 



Theorem 6.1. Consider the exponential random graph with sufficient statistic T defined 



in (4.1). Suppose /3i, . . . are such that 

k 



Y,\PMm{em-i)<2 



i=2 



where e{Hi) is the number of edges in Hi. Then the conclusions of Theorems 4.1 and 4.2 
hold true for this value of the parameter vector (/3i, . . . , /3fc). 

Proof. It suffices to prove that the maximizer of T{h) — I{h) as h varies over W is unique. 
This is because: if /i is a maximizer, then so is hu{x,y) := h{ax,o"y) for any measure 
preserving bijection a : [0, 1] — )• [0, 1]. The only functions that are invariant under such 
transforms are constant functions. 



Let Ah be the operator defined in Section 6.2 below. Let || • ||oo denote the norm on 



W (that is, the essential supremum of the absolute value). Let h and g be two maximizers 
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of T — /. For any finite simple graph H, a simple computation shows that 



\AHh- AngWoo < ^ \\AH,r,sh - AH,r,sg\ 
ir,s)eE(H) 

< e{H){e{H) - l)\\h - g\\^. 



Using the above inequaUty, Theorem 6.3 and the inequahty 

\x - y\ 



1 + l + ey 



< 



(easily proved by the mean value theorem) it follows that for almost all x,y, 



\h{^,y) -gix,y)\ 



„2E-^iftAH,9{x,j/) 



1 

i=l 

k 

< \\\h-g\\ooY^\Pi\e{Hi){e{H,)-l). 



i=l 



If the coefficient of — g||oo in the last expression is strictly less than 1, it follows that h 
must be equal to g a.e. □ 



6.1. Symmetry breaking. Theorems 4.2 and 6.1 establish various regions of symmetry 



in the exponential random graph model with sufficient statistic T defined in (4.1). That 



leaves the question: is there a region where symmetry breaks? We specialize to the simple 



case where k = 2 and H2 is a triangle, i.e., the example of Section 5 In this case, it turns 
out that replica symmetry breaks whenever /?2 is less than a sufficiently large negative 
number depending on /3i. 



Theorem 6.2. Consider the exponential random graph with sufficient statistic T defined in 
(5.1). Then for any given value of Pi, there is a positive constant C(/3i) sufficiently large 
so that whenever (32 < — C(/3i), T(h) — 1(h) is not maximized at any constant function. 
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Consequently, if Gn is an n-vertex exponential random graph with this sufficient statistic, 
then there exists e > such that 

lim pffa(G„,C)>e) =1 

where C is the set of constant functions. In other words, Gn does not look like an Erdos- 
Renyi graph in the large n limit. 



Proof. Fix Let p = e^^^ /(I + e^^i) and 7 := -/32, so that for any heW, 

T{h) - I{h) = -jt{H2,h) - Ip{h) - ^ log(l -p). 

Assume without loss of generality that P2 < 0. Suppose u is a constant such that the 
function h{x, y) = u maximizes T{h) — I{h), i.e., minimizes 'yt{H2, h) + Ip{h). Note that 

jt{H2, h) + Ip{h) = + Ip{u). 

Clearly, the definition of u implies that + Ip{u) < jx^ + Ip{x) for all x G [0, 1]. This 
implies that u must be in (0, 1), because the derivative of x ^x^ + Ip{x) is —00 at and 
00 at 1. Thus, 



= + Ipix)) 



37U + - log log ■ 



2 °l-u 2 

which shows that u < 0(7), where 0(7) is a function of 7 such that 



lim 0(7) = 0. 

7— >-oo 



This shows that 



(6.1) _Um ^min^ (7x'^ + Ip{x)) = 1^(0) = ^ log ^ 
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Next let g be the function 



if X, y on same side of 1/2 
p if not. 



Clearly, for almost all {x,y,z), g{x,y)g{y, z)g{z,x) = 0. Thus, t{H2,g) = 0. A simple 
computation shows that 



Ipia) = \ log 



1 



4 1 - _p 

Thus, ^t{H2,g) + Ip{g) = \ log This shows that if 7 is large enough (depending on 
p and hence then T — I cannot be maximized at a constant function. The rest of 
conclusion follows easily from Theorem 3.2 and the compactness of W. □ 



6.2. Euler— Lagrange equations. We return to the exponential random graph model 



with sufficient statistic T defined in (4.1) in terms of the densities of k fixed graphs 



Hi, . . . , Hk, where Hi is a single edge. Theorems 4.1 and |4.2| analyze this model when 
/32, . . . , are non-negative. What if they are not? One can still try to derive the Euler- 
Lagrange equations for the related variational problem of maximizing T{h) — I{h). The 
following theorem presents the outcome of this effort. 

For a finite simple graph H, let V{H) and E[H) denote the sets of vertices and edges 
of H. Given a symmetric measurable function h : [0, 1]^ — )• M, for each (r, s) G E{H) and 
each pair of points x^-^Xg G [0, 1], define 

^H,r,sHXr,Xs) ■= TT h{Xr',Xs') TT dXy. 

\n ^]V{H)\{r,s} 

(r',s')^(r,s) v^r,s 

For x,y £ [0, 1] define 

(6.2) AHh{x,y) := ^ /\H,r,sh{x,y). 

{t,s)&E{H) 
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For example, when H is a triangle, then V{H) = {1,2,3} and 



^0 



h{x, z)h{y, z) dz 



and therefore AHh{x,y) = 3 h{x, z)h{y, z)dz. When H contains exactly one edge, 
define Anh = 1 for any h, by the usual convention that the empty product is 1. 



Theorem 6.3. Let T : W 



be defined as in (4.1) and the operator Ah he defined as 



in (6.2). If h G W maximizes T{h) — I{h), then any representative element h £ h must 
satisfy for almost all {x, y) G [0, 1]^, 

Moreover, any maximizing function must he hounded away from and 1. 



Proof. Let g' be a symmetric bounded measurable function from [0, 1] into R. For each 
u G M, let 

hu{x, y) := h{x, y) + ug{x, y). 

Then hu is a symmetric bounded measure function from [0, 1] into M. First suppose 
that h is bounded away from and 1. Then £ W for every u sufficiently small in 
magnitude. Since h maximizes T(/i) — /(/i) among all elements of W, therefore under the 
above assumption, for all u sufficiently close to zero, 

T{hu)-I{hu)<Tih)-I{h). 

In particular, 

(6.3) ^{T{h^) - I{hu)) =0. 
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It is easy to check that T[hu) — I{hu) is differentiable in u for any h and g. In particular, 
the derivative is given by 

k 

{T{K) - l{K)) = Y.Pi^tiHu K) - ^liK). 



du 



1=1 



Now, 



A. 

du 



Consequently, 



du 



I{hu] 



u=0 



g{x,y) log — dydx. 

1 - hu{x,y) 



^ log ^4^^ dydx. 



Next, note that 
d 



du 



t{Hi, hu) 



[o,i]^(-ff) 



g{Xr,Xs) hu{Xr',Xs') J| dx^ 



{r',s')€E{H,) 
(r' ,s')^(r,s) 



veV{H) 



J J 9{x,y)^HiKix,y) dydx. 



Combining the above computations and (6.3), we see that for any symmetric bounded 
measurable g : [0, 1] — )• M, 

IJgix^y) (^^/3,AHMx,y) - ^ ^Qg ^ ^^^(f dydx = 0. 

Taking g{x, y) equal to the function within the brackets (which is bounded since h is 
assumed to be bounded away from and 1), the conclusion of the theorem follows. 
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Now note that the theorem was proved under the assumption that h is bounded away 
from and 1. We claim that this is true for any h that maximizes T{h) — I{h). To prove 
this claim, take any such h. Fix p G (0, 1). For each u G [0, 1], let 

hp,u{x,y) := (1 -u)h{x,y) + uuiax{h{x,y),p}. 

Then certainly, hp^u is a symmetric bounded measurable function from [0, 1]^ into [0, 1]. 
Note that 

= m.ax{h{x,y),p} - h{x,y) = [p - h{x,y))+. 
Using this, an easy computation as above shows that 
d 



u=0 



= J J (Y1 l^i^nM^^ y) - ^ log 1 ^h{xy) ^ ^'^ ~ '^^^^ 

where C is a positive constant depending only on Pi, . . . , Pk and Hi, ... , (and not on 
p or h). When h{x,y) = 0, the integrand is interpreted as oo, and when h{x,y) = 1, the 
integrand is interpreted as 0. 
Now, if p is so small that 

_C-llog-^>0, 
2 1 — p 

then the previous display proves that the derivative of T{hp^u) — I{hp^u) with respect to 
u is strictly positive at u = if /i < p on a set of positive Lebesgue measure. Hence h 
cannot be a maximizer oi T — I unless h > p almost everywhere. This proves that any 
maximizer oiT — I must be bounded away from zero. A similar argument shows that it 
must be bounded away from 1 and hence completes the proof of the theorem. □ 
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6.3. A solvable case with negative parameters. A j-star is an undirected graph with 
one 'root' vertex and j other vertices connected to the root vertex, with no edges between 
any of these j vertices. Let Hj be a j-star for j = 1, . . . ,k. Let T be the sufficient statistic 

k 



(6.4) 



Theorems 4.1 and 4.2 describe the behavior of this model when /32, ■ ■ ■ ,f3k are all non- 
negative. The following theorem shows that the behavior is the same even if f32, ■ ■ ■ ,/3k 
are all non-positive. This phenomenon for j-star models was first observed in simulations 
by Sukhada Fadnavis. 



Theorem 6.4. For the sufficient statistic T defined in (6.4), the conclusions of Theorems 



4.1 and 4.2 hold when /32, ■ ■ ■ , Pk o.f^ clU non-positive. 



Proof. Since /32, • • • , A < and / is a convex function, note that for any S W 
T{h) - I{h) = Pi j h{x,y)dxdy + Y,Pj j h{x,y)dy^' dx 
- j I{h{x,y))dxdy 

< Pi J Kx, y)dxdy + '^Pj(^ j j h{x, v)dydx^ 

< sup {Piu + /32u2 + ■■■ + PkU^ - I{u)), 
0<u<l 

with equality holding in all steps if and only if h is identically equal to a constant that 
solves the maximization problem in the last step. □ 
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Naturally, the question arises as to whether the conclusions of Theorems 4.1 and 4.2 



continue to hold for all values of /3i , . . . , /3fc , even when some of them are positive and some 
negative. As of now, we do not know the answer. 



7. Extremal behavior 

In the sections above we have been assuming that f32, ■ ■ ■ , f3k are positive or barely nega- 
tive. In this section we investigate what happens when k = 2 and P2 is large and negative. 
The limits are describable but far from Erdos-Renyi. Our work here is inspired by related 
results of Sukhada Fadvanis who has a different argument (using Turan's theorem) for the 
case of triangles. 

Suppose H is any finite simple graph containing at least one edge. Let T be the sufficient 
statistic 

Let Gn be the exponential random graph on n vertices with this sufficient statistic and let 



'i/'n be the associated normalizing constant as defined in (3.1). Then Theorem 3.1 gives 



lim iljn = sup {T{h) — I{h)) =: ijj, 
hew 



where / is defined in (3.2). We also know (by Theorem 3.2 that 



Sn{Gn, F*) — in probability as n — t- 00, 

where F* is the subset of W where T — I is maximized. (Note that F* is a closed set since 
T — / is an upper semicontinuos map.) 

We can compute F* and ip when /32 is positive, or negative with small magnitude. We 
are unable to carry out the explicit computation in the case of large negative (32, unless H 
is a convenient object like a j-star. However, a qualitative description can still be given 
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by analyzing the behavior of F* and ip as (32 ^ — oo. Fixing we consider these objects 
as functions of (32 and write F*{(32), ip{(32) and instead of F*, ip and T. 

Theorem 7.1. Fixing H and (3i, let F*{(32) and ^{(^2) he as above. Let x{H) be the 
chromatic number of H , and define 

, , [1 ^f[{x{H)-l)x\^[{x{H)-l)y], 
(7.1) g{x,y):=l 

otherwise, 

where [x] denotes the integer part of a real number x. Let p = e^^^ /{I + e^'^^). Then 

Urn sup (5n(/,ra) = 

and 



/32->-oo 2{x{H)-l) l-p 

Intuitively, the above result means that if (32 is a large negative number and n is large, 
then an exponential random graph G.„ with sufficient statistic T looks roughly like a 
complete {x{H) — l)-equipartite graph with 1 — p fraction of edges randomly deleted, 
where p = e^^^ /{I + e^^^). In particular, if H is bipartite, then G„ must be very sparse. 



since a 1-equipartite graph has no edges. [Figure 8] gives a simulation result for the triangle 
model with large negative (32- 



Theorem |7.1| is closely related to the Erdos-Stone theorem from extremal graph theory 
(or equivalently, Turan's theorem in the case of triangles as in the work of Fadnavis). 
Indeed, it may be possible to prove some parts of our theorem using the Erdos-Stone 
theorem, but we prefer the bare-hands argument given below. Due to this connection 
with extremal graph theory, we refer to behavior of the graph in the 'large negative /32' 
domain as extremal behavior. 
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Figure 8 . A simulated realization of the exponential random graph model 
on 20 nodes with edges and triangles as sufficient statistics, where /3i = 120 
and /32 = —400. (Picture by Sukhada Fadnavis.) 

Lemma 7.2. Let r be any integer > x{H)- Let he the complete graph on r vertices. 
Then for any symmetric measurable h : [0, 1]^ — )• {0, 1}, if t{Kr, h) > then t{H, h) > 0. 

Proof. Let hn{x, y) be the average value of h in the dyadic square of width 2^" containing 
the point {x,y). A standard martingale argument implies that the sequence of functions 
{hn}n>i converges to h almost everywhere. For any positive integer u, let denote the 
complete r-partite graph on ru vertices, where each partition consists of u vertices (so 
that Kj: = Kj-). Since r > x{H), it is easy to see that there exists u so large that H is a 
subgraph of K". Fix such a u. 

By the almost everywhere convergence of /i„ to h and the assumption that t(Kr,h) > 
0, there is a set of r distinct points xi,...,Xr G [0,1] such that h{xi,Xj) > and 
lim„_^.oo hn{xi, Xj) = h{xi,Xj) for each 1 < i / j < r. Since h is {0, l}-valued, /i(xj, xj) = 1 
for each i ^ j. Choose n so large that for each i ^ j, 

hni^Xi^ Xj^ ^1 £, 

where e = l/2ru. Let (-'^/)i<i<r, i<s<ji be independent random variables, where Xf is 
uniformly distributed in the dyadic interval of width 2^" containing Xi. Then for each 
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^<i^j^r,l<q,s<u, 

P(/l(Xf,X|) = 1) = hn{x^,XJ) > 1 - e. 

Therefore, 

P(/i(Xf , X|) = 1 for all 1 < i / j < r, 1 < s < u) > 1 - rue = 1/2. 

Let (^/)i<i<r, i<s<u be independent random variables uniformly distributed in [0, 1]. Con- 
ditional on the event that Yf belongs to the dyadic interval of width 2~" containing Xi, 
Yf has the same distribution as Xf. This shows that 

t{K^, h) > P(/i(y/, Yf) = l(oi alll j <r, I <q,s <u) >0. 

Since if is a subgraph of K^, therefore t{H, h) > 0. □ 



Theorem 7.3. Let g be the function defined in (7.1). Take any p € (0,1). If f is any 



element ofW that minimizes Ip{f) among all f satisfying t{H, f) = 0, then f = pg. 

Proof. Take any minimizer /. (Minimizers exist due to the Lovasz-Szegedy compactness 
theorem and the lower semicontinuity of Ip.) First, note that f < p almost everywhere: if 
not, then Ip{f) can be decreased by replacing / with min{/,p}, which retains the condition 
t{H,f) = 0. 

Next, note that for almost all x, y, /(x, y) = or p. If not, then redefine / to be equal 
to p wherever / was positive. This decreases the entropy while retaining the condition 
t{H,f) = 0. 

Let h = f /p. Then h takes value or 1 almost everywhere and h maximizes JJ h{x, y)dxdy 
among all symmetric measurable h : [0, 1]^ — )• {0, 1} satisfying t{H, h) = 0. Our goal is to 
show that h = g. 
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Let r := x{H). Let Xo,Xi,X2, ... be a sequence of i.i.d. random variables uniformly 
distributed in [0, 1]. Let 

TZ:={i: h{Xi,Xj) = 1 for all 1 < j < i}, 
and let R := \Tl\. Let A(a;) := J h{x,y)dy, so that for any given i, 

F{h{Xi,Xj) = 1 for all I < j < i) = E{X{Xiy-^) = E(A(Xo)*~^). 

Thus, 

oo 

E{R) = J2F{h{Xi,Xj) = 1 for all 1 < j < i) 

i=l 

oo 

= ^E(A(Xori) 

i=l 

oo 

(7.2) > ^(EA(Xo))-i = . = .... 

^ l-EA(Ao) 1 - Jj h{x,y)dxdy 



Let (7 be the function defined in (7.1 ). Suppose the vertex set of is {1, . . . , A;} for some 
integer k. lit{H,g) > 0, then there exist xi, . . . ,Xk such that g{xi,Xj) = 1 whenever 
is an edge in H. By the nature of g, this implies that H can be colored by r — 1 colors; 
since this is false, therefore t{H,g) must be zero. By the optimality property of /i, this 
gives 

h{x,y)dxdy > / / g{x,y)dxdy = 1 



r - 1 



Therefore by (7.2) 



E{R) > r - 1. 



Again by Lemma 7.2, t{Kr,h) = 0. Therefore, R < r — 1 almost surely. Combined with 



the above display, this shows that equality must hold in (7.2) and R = r — 1 almost surely. 
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In particular, E{X{Xo)^) = {E\{Xo))^ and EX{Xo) = 1 - l/(r - 1), which shows that 

X(x) = 1 ^— a.e. 

r — 1 

For each x, let A{x) := {y : h{x,y) = 0}. Then = l/(r — 1) a.e., where |^(a;)| 

denotes the Lebesgue measure of A{x). 

Define a random graph G on {0,1,2,...} by including the edge if and only if 

h{Xi,Xj) = 1. Since t{Kr,h) = 0, G cannot contain any copy of Kr- Thus, with 
probability 1, h{XQ,Xi) = for some i £ TZ. In other words, IJjg^^(Xj) cover almost all 
of [0, 1]. Again, = l/(r — 1) for alH € 7^ and \Tl\ = r — I almost surely. All this 

together imply that with probability 1, A{Xi) n A{Xj) has Lebesgue measure zero for all 
i ^ j ETZ, since 

J2 \A{X,) n A{Xj)\ < ^ \AiX,)\ - U AiX,) = 0. 

Let Yi,Y2, . . . and Zi, Z2, ... be i.i.d. random variables uniformly distributed in [0, 1], that 

are independent of the sequence Xi,X2, Since t{Kr,h) = 0, with probability 1 there 

cannot exist I and a set B of integers of size r — 2 such that h{Yi, Xi) = h{Zi, Xi) = 1 for 
all i e B, h{Xi, Xj) = 1 for all i^jeB, and h{Yi, Zi) = 1. 

Now fix a realization of Xi,X2,.... This fixes the set TZ. Take any i £ TZ. Let / 
be the smallest integer such that both Y/ and Zj are in A{Xi). Clearly Yj and Zj are 
independent and uniformly distributed in A{Xi), conditional on the sequence Xi,X2, ■ ■ . 
and our choice of i eTZ. By the observation from the preceding paragraph, h{Yi, Zj) = 
with probability 1, since the set Tl\{i} serves the role of B. 

This shows that given Xi,X2, ■ ■ the sets A{Xi) have the property that for almost all 
y,z £ A{Xi), h{y,z) = 0. Since A(x) = 1 - l/(r - 1) a.e. and \A{Xi)\ = l/(r - 1), this 
shows that for almost all y G A{Xi) and almost all z A{Xi), h{y,z) = 1. 
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The properties of (yl(Xj))jg7^ that we estabhshed can be summarized as follows: the 
sets A{Xi) are disjoint up to errors of measure zero; each A{Xi) has Lebesgue measure 
l/(r — 1) and together they cover the whole of [0, 1]; for almost all y, € [0, 1], h{y, z) = 
if they belong to the same A{Xi), and h(y,z) = 1 if y € A(Xi) and z G A(Xj) for some 
i ^ j. These properties immediately show that h is the same as the function g up to a 
rearrangement; the formal argument can be completed as follows. 

Given Xi,X2, . . ., let li : [0, 1] — )• [0, 1] be the map defined as 

u{x) := minimum i £ TZ such that x £ A{Xi). 

Note that with probability 1, for almost all x there is a unique i G TZ such that x G 
A(Xi). Let a : [0, 1] — t- [0, 1] be a measure-preserving bijection such that x i— u{ax) is 
a non-increasing (we omit the construction). Then a maps the intervals [0, l/(r — 1)], 
[l/(r - l),2/(r - 1)], [(r - 2)/(r - 1),1] onto the sets (A(Xi))jg7^ up to errors of 
measure zero. By the properties of A{Xi) established above, this shows that h{ax,ay) is 
the same as g{x,y) up to an error of measure zero. □ 



Proof of Theorem 7.1 First, note that 



Tp,{h) - I{h) = ^2t{H, h) - Ip{h) - ^ log(l - p), 

where p = e'^^^/{l + e^^^). Take a sequence (3!^^ — )• — oo, and for each n, let /i„ be an 
element of -F*(/32"''). Let h he a limit point of /i„ in W. If t{H,h) > 0, then by the 
continuity of the map t{H, •) and the boundedness of Ip, 



lim VCM"^) 



-oo. 



But this is impossible since tpifi!^^) is uniformly bounded below, as can be easily seen by 



considering the function g defined in (7.1) as a test function in the variational problem. 



Thus, t{H, h) = 0. If / is a function such that t{H, /) = and Ip{f) < Ip{h), then for all 
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sufficiently large n, 

r^(„)(/i„)-i(/i„) <r5(„) (/)-/(/) 

contradicting the definition of F*{f32)- Tlius, if / is a function such that t{H, f) = 0, then 



Ipif) ^ Ipih). By Theorem 7.3, this shows that h = pg. The compactness of W now 
proves the first part of the theorem. 
For the second part, first note that 

limmiipi/]!^^) > lim {T (n){pg) - I{pg)) 

n— >oo n— >oo P2 

= -ip{pg) - ^iog(i -p) 
- «^)-^) log. ' 



In) 

Next, note that by the lower-semicontinuity of Ip and the fact that ^2 is eventually 



negative. 



limsupV(/3j"^) = \irasMl5t^t{H,hn) - Ip{hn)) -\\og{l-p) 



< limsup(-/p(/i„)) - -log(l -p) 



< -ipipg) - ^log(l -p). 
The proof is complete. □ 

8. Transitivity and clumping 

In the social networks literature, one of the key motivations for considering exponential 
random graphs is to develop models of random graphs that exhibit 'transitivity'. In 
simple terms, this means that a friend of a friend is more likely to be a friend than a 
random person. Presence of transitivity gives rise to 'clumps' of nodes that have higher 
connectivity between themselves. Since transitivity is closely related to the presence of 
'triads' (i.e. triangles) researchers initially tried to model transitivity by the exponential 
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random graph with edges and triangles as sufficient statistics. Sometimes, j-stars were 
thrown in for additional effect. For a history of such attempts and their experimental 
outcomes, see the discussion in Snijders et. al. |51j . 



However, as seen in experiments and through heuristics [36] and proved in Theorems 4.1 



and 7.1 , it is futile to model transitivity with only edges and triangles as sufficient statistics. 
If (^2 is positive, the graph is essentially behaving like an Erdos-Renyi graph, while if (32 
is negative, it becomes roughly bipartite. The degeneracy observed in experiments and 



proved in Theorem 5.1 also renders this model quite useless. 

Recently, Snijders et. al. [51] have suggested a certain class of models that exhibit the 
desired transitivity and clumping properties in simulations. These models are of the type 



(4.1 ), where Hj is a j'-star (or 'j-triangle', as defined in [5T]) for j = 1, . . . ,k — l and H/^ is 
a triangle. The crucial assumption is that the parameters Pi, P2, ■ ■ ■ , Pk have alternating 
signs. Usually, there is a single unknown parameter A and /3j is taken to be (— 1)-'~"'^A~-^ 
for j = 1, . . . ,k — 1. Based on simulations and heuristics, the authors of [5T] claim that 
this class of models should demonstrate transitivity and clumping properties. 

Although we do not yet have a general understanding as to why alternating sign models 
should give rise to transitivity, we can prove it in a certain special case. In this model, 
A; = 3 and Hi = a single edge, H2 = a 2-star and = a triangle. There is a single 
unknown (positive) parameter f3, and the sufficient statistic is defined as 

r(7) := 3(3t{HiJ) - 3pt{H2j)+Pt{HsJ). 



Let F* = F*{f3) be as in Theorem 3.2 Of course, if f3 is sufficiently small, F*{f3) consists 
of a single constant function (and hence the model is effectively Erdos-Renyi) by Theorem 
However, as following theorem shows, all elements of F*{I3) exhibit two clumps of 



6.1 



roughly equal size when /3 is large. 
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Theorem 8.1. In the setting described above, 

lim sup 5n(/,^) = 0, 

where 

1 if x,y on same side of 1/2 

Hx,y) = < 

1/2 if not. 

Proof. Note that T{f) can be alternately written as a constant plus 5(1 — /), where 

S{g) ■.= -pt{H^,g). 



The proof is now complete by Theorem 7.1 applied to the model with sufficient statistic 



S. □ 

Intuitively, the function h in the above theorem represents connectivities between people 
in a population divided into two equal parts, say democrats and republicans, where all 
democrats are friends with each other, as are republicans; and there is a probability 
1/2 of friendship between a democrat and republican. It is clear that this arrangement 
automatically gives rise to transitivity. 
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